Data compression having more effective compression

ABSTRACT

A lossless data compression system comprises a content addressable memory dictionary ( 30 ), a coder ( 38 ), and a run length encoding means ( 39 ) connected to receive the output of the coder ( 38 ), the encoding means ( 39 ) being arranged to count the number of times a match consecutively occurs at a predetermined dictionary location, i.e. the number of times the same search tuple is loaded into the same address ( 50 ) of the dictionary. Compression is improved.

[0001] This invention relates to a method and apparatus for the losslesscompression of data.

[0002] While lossy data compression hardware has been available forimage and signal processing for some years, lossless data compressionhas only recently become of interest, as a result of increasedcommercial pressure on bandwidth and cost per bit in data storage anddata transmission; also, reduction in power consumption by reducing datavolume is now of importance.

[0003] The principle of searching a dictionary and encoding data byreference to a dictionary address is well known, and the apparatus toapply the principle consists of a dictionary and a coder/decoder.

[0004] In Proceedings of EUROMICRO-22, 1996, IEEE, “Design andPerformance of a Main Memory Hardware Data Compressor”, Kjelso, Goochand Jones describe a novel compression method, termed the X-Matchalgorithm, which is efficient at compressing small blocks of data andsuitable for high speed hardware implementation.

[0005] The X-Match algorithm maintains a dictionary of data previouslyseen, and attempts to match a current data element, referred to as atuple, with an entry in the dictionary, replacing a matched tuple with ashorter code referencing the match location. The algorithm operates onpartial matching, such as 2 bytes in a 4 byte data element. InProceedings of EUROMICRO-25, 1999, IEEE, “The X-MatchLITE FPGA-BasedData Compressor”, Nunez, Feregrino, Bateman and Jones describe theX-Match algorithm implemented in a Field Programmable Gate Array (FPGA)prototype.

[0006] It is an object of the invention to provide a lossless datacompression algorithm which can compress data more effectively than ispossible with the published arrangement.

[0007] According to the invention, a lossless data compression systemcomprising a content addressable memory dictionary and a coder,characterised by run length encoding means connected to receive theoutput of the coder, said encoding means being arranged to count thenumber of times a match consecutively occurs at a predetermineddictionary location.

[0008] Also according to the invention, a lossless method of compressingdata comprising the steps of:

[0009] comparing a search tuple of fixed length with a plurality oftuples of said fixed length stored in a dictionary;

[0010] indicating the location in the dictionary of a full or partialmatch or matches;

[0011] selecting a best match of any plurality of matches; and

[0012] encoding the match location and the match type;

[0013] characterised by the further steps of:

[0014] loading each search tuple in turn into the same address in thedictionary;

[0015] and counting the number of times identical tuples are matchedconsecutively into said address.

[0016] Preferably said same address as the first location in thedictionary.

[0017] In the drawings, FIG. 1 illustrates the architecture of acompressor arrangement published by Nunez et al.

[0018] The invention will be described by way of example only withreference to FIGS. 2-5 in which:

[0019]FIG. 2 illustrates the architecture of the compressor hardware

[0020]FIG. 3 illustrates the run length internal encoder

[0021]FIG. 4 illustrates a dictionary of varying size

[0022]FIG. 5 illustrates in detail the run length internalcoder/decoder, and

[0023]FIG. 6 illustrates the compressor/decompressor circuitschematically

[0024] In the prior art as shown in FIG. 1, a dictionary 10 is based onContent Addressable Memory (CAM) and is searched by data 12 supplied bya search register 14. In the dictionary 10 each data element is exactly4 bytes in width and is referred to as a tuple. With data elements ofstandard width, there is a guaranteed input data rate during compressionand output data rate during decompression, regardless of data mix.

[0025] The dictionary stores previously seen data for a currentcompression; when the search register 14 supplies a new entry and amatch is found in the dictionary, the data is replaced by a shorter codereferencing the match location. CAM is a form of associative memorywhich takes in a data element and gives a match address of the elementas its output. The use of CAM technology allows rapid searching of thedictionary 10, because the search is implemented simultaneously at everyaddress at which data is stored, and therefore simultaneously for everystored word.

[0026] In the X-Match algorithm, perfect matching is not essential. Apartial match, which may be a match of 2 or 3 of the 4 bytes, is alsoreplaced by the code referencing the match location and a match typecode, with the unmatched byte or bytes being transmitted literally,everything prefixed by a single bit. This use of partial matchingimproves the compression ratio when compared with the requirement of 4byte matching, but still maintains high throughput of the dictionary.

[0027] The match type indicates which bytes of the incoming tuple werefound in the dictionary and which bytes have to be concatenated inliteral form to the compressed code. There are 11 different match typesthat correspond to the different combinations of 2, 3 or 4 bytes beingmatched. For example 0000 indicates that all the bytes were matched(full match) while 1000 indicates a partial match where bytes 0, 1 and 2were matched but byte 3 was not, and must be added as an uncompressedliteral to the code. Since some match types are more frequent thanothers a static Huffman code based on the statistics obtained throughextensive simulation is used to code them. For example the most popularmatch type is 0000 (full match) and the corresponding Huffman code is01. On the other hand a partial match type 0010 (bytes 3, 2 and 0 match)is more infrequent so the corresponding Huffman code is 10110. Thistechnique improves compression.

[0028] If, for example, the search tuple is CAT, and the dictionarycontains the word SAT at position 2, the partial match will be indicatedin the format (match/miss) (location) (match type) (literals required)which in this example would be 022S, binary code 0 000010 0010 1010011,i.e. the capital C is not matched, and is sent literally to the codingpart of the system

[0029] The algorithm, in pseudo code, is given as:

[0030] Set the dictionary to its initial state;

[0031] DO { read in tuple T from the data stream;  search the dictionaryfor tuple T;  IF (full or partial hit) { determine the best matchlocation  ML and the match type MT;  output ‘0’;  output Binary code forML;  output Huffman code for MT;  output any required literal characters of T; } ELSE { output ‘1’; output tuple T; } IF (full hit){move dictionary entries 0 to ML-1 by  one location;} ELSE { move alldictionary entries down by  one location;} copy tuple T to dictionarylocation 0; }

[0032] WHILE (more data is to be compressed);.

[0033] The dictionary 10 is arranged on a Move-To-Front strategy, i.e. acurrent tuple is placed at the front of the dictionary and other tuplesmoved down by one location to make space. If the dictionary becomesfull, a Least Recently Used (LRU) policy applies, i.e., the tupleoccupying the last location is simply discarded.

[0034] The dictionary is preloaded with common data.

[0035] The coding function for a match is required to code threeseparate fields, i.e.

[0036] (a) the match location in the dictionary 10; uniform binary codewhere the codes are of the fixed length log₂ (DICTIONARY_SIZE) is used.

[0037] (b) a match type; i.e. which bytes of an incoming tuple match ina dictionary location; a static Huffman code is used.

[0038] (c) any extra characters which did not match the dictionaryentry, transmitted in literal form.

[0039] Referring again to FIG. 1, the match, or partial match or severalpartial matches, are output by the dictionary 10 to a match decisionlogic circuit 16, which supplies a main coder 18 which provides a codedsignal to an output assembler 20 which provides a compressed data outputsignal 22. A shift control logic 24 connected between the match decisionlogic 16 and the dictionary 10 provides shift signals to the dictionary.The whole circuit can be provided on a single semiconductor chip.

[0040] Referring now to a compressor according to the invention asillustrated in FIG. 2, a dictionary 30 is based on CAM technology and issupplied with data to be searched 32 by a search register 34. Thedictionary searches in accordance with the X-Match algorithm, and isorganised on a Move To Front strategy and Least Recently UsedReplacement policy.

[0041] The dictionary output is connected to a match decision logiccircuit 36 which is connected to a main coder 38, which provides signalsto a coder 39 which will be referred to as a ‘Run Length Internal’ (RLI)coder, which provides signals to an output assembler 40. The assembler40 provides an output stream of compressed data 42.

[0042] It is to be understood that, while it is known to apply runlength encoding to data before it is coded, it has not previously beensuggested that a run length encoder is positioned between the main coderand the output assembler in a data compression system.

[0043]FIG. 3 illustrates the coder output and dictionary adaptationprocessed during normal and RLI coding events Eight steps are shown; foreach step the top four dictionary addresses 0, 1, 2, 3, references 50,52, 54, 56, are shown, with the addresses 58 shown on the left and anadaptation vector 60 shown on the right. It will be seen that eachlocation content is exactly 4 bytes long.

[0044] Dictionary address 3, reference 56, is a reserved location and isused to signal RLI runs; an internal run counter 62 is shown adjacent toaddress 3.

[0045] In each of the eight steps, a previous search tuple is loadedinto address 0, reference 50, and the previously stored data is shifteddown one position. This is indicated by the current adaptation vector onthe right hand side of location 0 being set to 1 in all eight steps. Ifthere is not a full match, the data in the last location is deleted tomake room for a new tuple.

[0046] The arrows pointing downwards within the dictionary, such as thearrows A, indicate rearrangement of the dictionary at the end of eachstep under the control of the adaptation vector 60 of that step.

[0047] Associated with each step there is an output box 64 whichindicates the output of the dictionary 30 for that step.

[0048] In step 1, the search tuple is “at_i”; a full match is found ataddress 1, reference 52, and the output in box 64 indicates this. Thefirst entry in the box “1” indicates that a match has been found; thenext entry indicates a match address; the third entry indicates thematch type, i.e. “0” because the match is a full match. The fourth entryis blank, because, with a full match, there are no literals to betransmitted.

[0049] The dictionary is updated in accordance with the adaptationvector 60; a bit setting of “1” indicates “load data from previousposition” and a bit setting of “0” indicates “keep current data”;therefore the entry at address 0, reference 50, is replaced by thesearch tuple “at_i” and the entry at address 1, reference 52, isreplaced by “the”; the entry at address 2, reference 54, is unchanged.

[0050] In step 2, the search tuple is “ry_”; there is no match, i.e. amiss, and the output box 64 indicates that there is no match, i.e thefirst entry is “0”; the address and match type entries are blank, andthe literals to be sent are “ry_”.

[0051] The adaptation vector 60 updates the dictionary as indicated bythe arrows A that is all entries move down one address.

[0052] In step 3 the search tuple is “this” and a partial match is foundat address 2; the output box 64 indicates that there is a match, thatthe match is at address 2, that the match type is a partial match (i.e.the setting is “3”), and that the non-matching part—the literals to besent, are “is”. The dictionary is updated.

[0053] In step 4, the search tuple is “at_i”, and a full match is foundat address 2 as indicated in the output box 64.

[0054] In step 5, the search tuple is again “at_i”, and a match is foundat address 0, this is indicated in the output box 64.

[0055] Because the same tuple has been repeated, the internal runcounter 62, which has remained at a zero setting in the previous steps,is now set to 1; a possible run is indicated, but a normal output isstill given, box 64, because a run is not yet certain.

[0056] In step 6, the search tuple is again “at_i”; the internal runcounter 62 is incremented to 2. This time a valid run is indicated,there is no output so the output box 64 is blank. Also the outputcorresponding to step 5 is empty from the RLI coding register since itwill now be coded as part of the RLI event.

[0057] In step 7 the search tuple is once more “at_i”, the internal runcounter is incremented to 3, and the output box 64 remains blank.

[0058] In step 8 the search tuple is “at_v”; the internal run has ended.A partial match is found at address 0; the output box 64 indicates thatthe match is found at address 0, that the match type is partial, andthat the literal to be sent is (v).

[0059] The count of the internal run counter 62 is now sent as shown inthe RLI output box 66. A match was found at address 3, reference 56,i.e. the address reserved for internal runs, and the length of the runwas 3, which is sent as an 8-bit code.

[0060] Although the arrangement is such that one dictionary address islost (because it is reserved to signal RLI codes) the improvement incompression, which may be 10%, more than compensates for the one-wordloss in dictionary size.

[0061] It is to be understood that internal run length encoding onlyoperates with full matches, and not with partial matches. It will alsobe understand that full matches of 4 bytes of data can be detected. Thisis in contrast to the arrangement disclosed in the publication by Kjelsoreferred to above in which a run length encoder sensitive only to 0s isdisclosed; runs of 0 are common in coding arrangements. In addition, theposition of the prior art encoder was such that it preceded applicationof the X-Match encoder, i.e. it operated on incoming data before thedata was supplied to the dictionary in which the X-Match algorithm isapplied. In the inventive arrangement, the run length encoding isintegrated with the dictionary coding and does not precede it.

[0062] The inventive arrangement has two distinct features; the first isthat its contents can be search in a single cycle, and extra logic isadded to a conventional content addressable memory to allow it to detectconsecutive input sequences which are identical; this is achieved bytransmission of a dictionary address which has not yet been utilised forthe storage of dictionary data; this is described above. A secondfeature is that the dictionary size and the codes which indicatemultiple consecutive input sequences are varied dynamically, based onthe number of new data items to be entered into the dictionary; in otherwords, the size of the dictionary vanes.

[0063] This is illustrated in FIG. 4 which shows the same dictionaryfeatures as FIG. 3, but also shows 8 dictionary locations 50-56 and51-57. In step 1 all the dictionary locations are set to the same datavalue, which in effect declares invalid all the dictionary locationsbelow the first location 50, without the need for additional “dictionarylocation valid” logic. The reason is that in the case of multiple fullmatches during a dictionary search, the best match decision logic alwaysselects the match closer to the top of the dictionary, thus invalidatingall locations below it. The locations are all set to zero in theexample.

[0064] In the first step, the code word book only has 2 values,corresponding to the first location 50, and to the RLI location, whichat this stage is at location 52.

[0065] If, for example, the input data to the dictionary consists of1020 bytes of data all of value zero, the dictionary does not grow inlength, and the RLI code will be activated once to code a run of 255tuples for the total of 1020 bytes. The run is counted by RLI counter 62as described with reference to FIG. 3.

[0066] The output of the coder will be:

[0067] 011 1 1 1 1 111 (10 bits).

[0068] 0=>Match 1=>dictionary location (only two valid locations) 1 1 11 1 1 1 1=>255 run length.

[0069] In Step 1 the search tuple is at_i, which is output as a literal.

[0070] In Step 2 “at_i” has been stored in dictionary location 50, andthe search tuple is “ry_”; the dictionary now has three valid locations,the location reserved to signal RLI runs having been moved from location52 to location 54.

[0071] In Step 3, the search tuple is “this” and there are four validlocations. In Step 4, the search tuple is “at_i” ard there are fivevalid locations, the reserved location now being at location 51.

[0072] Steps, 5, 6, 7 & 8 indicate the effect of a repeated tuple, thedictionary remains at a length of 5 valid locations, with the reservedlocation at 51.

[0073] If, after step 8, a new search tuple is presented, the dictionarywill grow in size to store it.

[0074] The maximum compression ration enabled by the combination of RLI& PBC (Phased Binary Coding) is 10/(1020*8)=0.00122(819:1). Of coursethis is a maximum theoretical limit that will only be achieved when thedata byte repeats itself for the entire length of the block butillustrates the advantage of combining an internal run length coder plusa move to front growing dictionary model. In general RLI will use to itsadvantage PBC as long as the dictionary is not completely full and a runof length greater than 2 takes place. If all the dictionary locationsare valid using PBC or UBC (Uniform Binary Coding) gives the sameresults. Another prefix-free coding technique can be used to replace PBCand the same principles apply such as Rice coding or Phased HuffmanCoding where a fraction of the dictionary is valid initially.

[0075] The algorithm, in pseudo code, is given as:

[0076] Set the dictionary to its initial state;

[0077] Set the next free location counter =2;

[0078] Run length count =0,

[0079] DO { read in tuple T from the data stream; search the dictionaryfor tuple T; IF (full hit at location zero) { increment run length countby one; } ELSE { IF (run length count=1) { output “0”; output phasedbinary code for ML 0; output Huffman code for MT 0; } IF (run lengthcount >1) { output “0”; output phased binary code for MLNEXT_FREE_LOCATION−1; output Binary code for run length; } set runlength count to 0; IF(full or partial hit) { determine the best matchlocation ML and the match type MT; output “0” output phased binary codefor ML; output Huffman code for MT; output any required literalcharacters of T; } ELSE { output “1”; output tuple T; } } IF (full hit)move dictionary entries 0 to ML-1 by one location ELSE { move alldictionary entries down by one location; increase next free locationcounter by one; } copy tuple T to dictionary location 0; }

[0080] WHILE (more data is to be compressed);

[0081]FIG. 5 illustrates the operation of a RLI coder and a RLI decoder.

[0082] During compression, as described with reference to FIG. 3, thecounter 62 is activated by a full match at location 0; the counterremains enabled and counting while consecutive full matches at 0 arebeing detected. When the run terminates, the count is concatenated tothe rest of the RLI code formed by a 0 indicating a match and thereserved position corresponding to the last active position in thedictionary.

[0083] During decompression the counter 62 is loaded with the count fromthe RLI code and then begins to count, starting at zero, until theloaded value is reached. The output of the RLI decoder is full match atlocation 0 while the count value is not reached.

[0084] The RLI coder 39 comprises a RLI coding register 70 and RLIcoding control unit 72, which is connected to RLI counter 62 (see FIG.3). Counter 62 is an 8-bit register and is common to both compressionand decompression. The 8-bit counter 62 is connected to a RLI decodingcontrol unit 74 in an RLI decoder 76 which also contains a RLI decodingregister 78.

[0085] The RLI coding register 70 buffers code before the code accessesthe RLI coding control unity 72; unit 72 controls the RLI coding processand outputs the correct code/code length pair depending on whether thecompression is operating normally, or whether a run length coding eventis taking place.

[0086] When the RLI coder 39 becomes active, the RLI coding register isempty from the previous code, and output is frozen while the run takesplace.

[0087] In the RLI decoder 76, the RLI decoding control unit 74 has acomplementary function to the RLI coding control unit 72; unit 74outputs the correct match location/match type pair depending on whetherthe circuit is operating normally, i.e. on individual bytes, or if runlength decoding is taking place.

[0088] The RLI decoding register 78 has the same functionality as theRLI coding register 70.

[0089] The 8 bit RLI counter 62 does not use any specific technique todetect an overflow condition if a pattern repeats more than 255 times.The counter simply loops back to 0, the condition is detected by the RLIcontrol logic 72 as the end of a run, and a run length code is output.The next code after an RLI code event is always a normal code, even whenthe pattern continues to repeat. With a continued repeat, the counter 62exceeds the count of 1 again and the run length detection signal isreactivated.

[0090] During decompression, the fact that no two RLI codes can beconsecutive is used to load the RLI count into the RLI decoder 76 onlyonce. This mode of operation simplifies the RLI control units.

[0091] A detailed coder/decorder circuit is shown in FIG. 6.

[0092] Uncompressed data 32 is supplied to the CAM dictionary 30, andthe dictionary output, i.e. an indication of the dictionary address atwhich a match has been found, or the address of a partial match plus theunmatched byte or bytes, is supplied to a priority logic circuit 80,which assigns a different priority to each of the different types ofpossible matches in the dictionary, i.e. full, partial or miss, andsupplies the result to a match decision logic circuit 82. Circuit 82uses the priority types to select one of the matches as the best forcompression using the priority information and supplies a signal to amain coder 38.

[0093] The main coder 38 operates, as described in the prior artreferred to above, to assign a uniform binary code to the matchinglocation and static Huffman code to the match type, and concatenates anynecessary bytes in literal form. The compressed output is supplied tothe RLI coder 39, described with reference to FIG. 4. This signal isproduced by the main coder but is not shown in its diagram forsimplicity. The RLI coder output passes to a bit assembly logic 40 whichwrites a new 64-bit compressed output to memory whenever more than 64bits of compressed data are valid in an internal buffer (not shown). Theoutput is compressed code 42.

[0094] The output from the priority logic circuit 80 is also supplied toan out-of-date adaptation (ODA) logic circuit 84, as described in ourco-pending patent application no GB 0001711.1 filed on even date. Theoutput of the ODA circuit 84 is connected to a move generation logiccircuit 44 which generates a move vector (as the adaptation vectorapplied in FIG. 3) depending on the match type and match location. Themove generation logic 44 also provides a feedback signal to the ODAlogic circuit 84. (NB out-of-date adaptation is not shown in FIG. 3 forsimplicity)

[0095] For decompression, compressed input 90 is supplied to a bitdisassembly logic circuit 92 which reads a new 64-bit compressed vectorfrom memory whenever fewer than 33 bits are left valid in an internalbuffer (not shown) after a decompression operation. The compressedvector is supplied to a main decoder 94 which decodes the match locationand match type, together with any required literal characters anddetects any possible RLI codes. The decoder 94 is connected to the RLIdecoder 76 which supplies its run length decoded output to the ODA logiccircuit 84 and also to a tuple assembly circuit 96.

[0096] The CAM dictionary 30 operates on the decoded input to regenerate4 byte wide words which are supplied to the tuple assembly circuit 96;this circuit supplies uncompressed data 98, which comprises tuplesassembled using information from the dictionary 30, plus any literalcharacters present in the code.

[0097] Application of Run Length Internal coding according to theinvention has been found to achieve the compression improvement, whichmay be 10%, with little or no effect on the speed of compression. Theimprovement results from the efficient run length encoding of anyrepeating pattern, such as a 32 bit pattern. The most common repeatingpattern is a run of 0s, but others are possible such as the spacecharacter in a text file or a constant background colour in a picture.Application of the invention allows efficient, lossless coding anddecoding of such non-zero characters.

[0098] The Least Recently Used dictionary maintenance policy forces anyrepeating pattern to be located at position zero in the dictionary 30.Run Length Internal coding detects and codes any vector which is fullymatched at position zero twice or more.

[0099] Such an arrangement offers a compression advantage in comparisonwith locating a run length encoder before the dictionary in acompression system, and since it uses the dictionary logic, complexityis kept to a minimum with a higher level of integration in thearchitecture.

[0100] The CAM dictionary 30 can have 15, 31 or 63 words; one positionis already reserved for RLI events. A bigger dictionary improvescompression but increases complexity significantly.

[0101] The uncompressed data-out 98 is identical to the data-in 32.There has been no loss.

[0102] The present invention is likely to find application when smallblocks of data are to be compressed.

1. A lossless data compression system comprising a content addressablememory dictionary 30 and a coder 38, characterised by run lengthencoding means 39 connected to receive the output of the coder 38, saidencoding means 39 being arranged to count the number of times a matchconsecutively occurs at a predetermined dictionary location.
 2. A systemaccording to claim 1 in which the dictionary 30 is arranged so that ateach search step a search tuple is loaded into the same address 50 ofthe dictionary.
 3. A system according to claim 2 in which the run lengthencoder register means 39 is arranged to count the number of times thesame search tuple is loaded into the same address 50 of the dictionary30.
 4. A system according to claim 2 or claim 3 in which a furtheraddress 56 in the dictionary 30 is reserved to indicate the number oftimes a search tuple is repeated.
 5. A system according to claim 4 inwhich the further address varies in accordance with the size of thedictionary.
 6. A system according to any preceding claim in which thedictionary 30 is arranged to hold data elements which are all ofprecisely equal length and each dictionary entry holds multiple dataelements
 7. The system according to claim 6 in which each dictionaryentry holds 4 data elements
 8. A system according to any preceding claimin which consecutive matches are indicated by transmission of adictionary address which is not yet utilised for storage of dictionarydata.
 9. A lossless data decompression system comprising a contentaddressable memory dictionary 30 and a decoder 94, characterised by runlength decoder register means 76 connected to receive the output ofdecoder 94
 10. A lossless method of compressing data comprising thesteps of; comparing a search tuple of fixed length with a plurality oftuples of said fixed length stored in a dictionary; indicating thelocation in the dictionary of a full or partial match or matches;selecting a best match of any plurality of matches; and encoding thematch location and the match type; characterised by the further stepsof; loading each search tuple in turn into the same address in thedictionary; and counting the number of times identical tuples arematched consecutively into said address.